refactor(agent-service): redesign sync-execution result and error model by bobbai00 · Pull Request #6009 · apache/texera

bobbai00 · 2026-06-29T00:41:29Z

What changes were proposed in this PR?

Refactors the sync-execution result/error contract and updates its consumers.

Core wire-model changes:

OperatorInfo -> OperatorExecutionSummary
SyncExecutionResult -> WorkflowExecutionSummary
sampled rows use SampleRow { rowIndex, tuple } instead of embedding row metadata in tuple payloads
execution errors reuse the shared WorkflowFatalError shape used by compilation errors

Updated the Scala sync-execution producer, agent-service result formatting/state/server paths, and frontend operator-result mapping to consume the new shape.

Any related issues, documentation, discussions?

Closes #5750.
Part of #5747.
Supersedes #5927.

How was this PR tested?

cd agent-service && bun run typecheck
cd agent-service && bun test src/agent/tools/result-formatting.spec.ts src/agent/tools/tools-utility.spec.ts src/agent/tools/workflow-execution-tools.spec.ts src/server.spec.ts
cd frontend && yarn ng test --watch=false --include src/app/workspace/service/agent/agent.service.spec.ts
sbt "WorkflowExecutionService / testOnly org.apache.texera.web.resource.SyncExecutionResourceSpec"

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.8 (1M context), Claude Fable 5, Codex

github-actions · 2026-06-29T00:41:46Z

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

Contributors with relevant context: @Ma77Ball, @Yicong-Huang
You can notify them by mentioning @Ma77Ball, @Yicong-Huang in a comment.

codecov-commenter · 2026-06-29T00:42:50Z

Codecov Report

❌ Patch coverage is 91.05431% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.61%. Comparing base (5c4a963) to head (27a267c).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
...he/texera/web/resource/SyncExecutionResource.scala	77.58%	23 Missing and 3 partials ⚠️
...agent-interaction/agent-interaction.component.html	66.66%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #6009      +/-   ##
============================================
+ Coverage     59.11%   60.61%   +1.49%     
- Complexity     3201     3239      +38     
============================================
  Files          1132     1133       +1     
  Lines         43681    43457     -224     
  Branches       4734     4715      -19     
============================================
+ Hits          25821    26340     +519     
+ Misses        16430    15670     -760     
- Partials       1430     1447      +17

Flag	Coverage Δ		*Carryforward flag
access-control-service	`70.00% <ø> (ø)`		Carriedforward from a14a5d0
agent-service	`57.12% <100.00%> (+12.53%)`	⬆️
amber	`64.11% <77.58%> (+0.99%)`	⬆️	Carriedforward from a14a5d0
computing-unit-managing-service	`0.00% <ø> (ø)`		Carriedforward from a14a5d0
config-service	`52.30% <ø> (ø)`		Carriedforward from a14a5d0
file-service	`62.81% <ø> (ø)`		Carriedforward from a14a5d0
frontend	`51.73% <96.07%> (+0.41%)`	⬆️
notebook-migration-service	`78.57% <ø> (ø)`		Carriedforward from a14a5d0
pyamber	`91.19% <ø> (ø)`		Carriedforward from a14a5d0
workflow-compiling-service	`55.14% <ø> (ø)`		Carriedforward from a14a5d0

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-29T00:47:27Z

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 4 worse · ⚪ 9 noise (<±5%) · 0 without baseline

Compared against main 6de8a48 benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

	config	throughput	MB/s	latency	max Δ latest / 7d
🔴	bs=10 sw=10 sl=64	352	0.215	27,280/36,228/36,228 us	🟢 -27.2% / 🔴 +139.2%
🔴	bs=100 sw=10 sl=64	746	0.455	133,787/157,815/157,815 us	🔴 +6.9% / 🔴 +46.2%
⚪	bs=1000 sw=10 sl=64	899	0.549	1,113,349/1,201,348/1,201,348 us	⚪ within ±5% / 🔴 +16.8%

Baseline details

Latest main 6de8a48 from same runner

config	metric	PR	latest main	7d avg	Δ latest	Δ 7d
bs=10 sw=10 sl=64	throughput	352 tuples/sec	367 tuples/sec	776.36 tuples/sec	-4.1%	-54.7%
bs=10 sw=10 sl=64	MB/s	0.215 MB/s	0.224 MB/s	0.474 MB/s	-4.0%	-54.6%
bs=10 sw=10 sl=64	p50	27,280 us	23,001 us	12,636 us	+18.6%	+115.9%
bs=10 sw=10 sl=64	p95	36,228 us	49,761 us	15,143 us	-27.2%	+139.2%
bs=10 sw=10 sl=64	p99	36,228 us	49,761 us	18,954 us	-27.2%	+91.1%
bs=100 sw=10 sl=64	throughput	746 tuples/sec	793 tuples/sec	985.33 tuples/sec	-5.9%	-24.3%
bs=100 sw=10 sl=64	MB/s	0.455 MB/s	0.484 MB/s	0.601 MB/s	-6.0%	-24.3%
bs=100 sw=10 sl=64	p50	133,787 us	125,121 us	101,671 us	+6.9%	+31.6%
bs=100 sw=10 sl=64	p95	157,815 us	155,281 us	107,939 us	+1.6%	+46.2%
bs=100 sw=10 sl=64	p99	157,815 us	155,281 us	113,798 us	+1.6%	+38.7%
bs=1000 sw=10 sl=64	throughput	899 tuples/sec	911 tuples/sec	1,016 tuples/sec	-1.3%	-11.6%
bs=1000 sw=10 sl=64	MB/s	0.549 MB/s	0.556 MB/s	0.62 MB/s	-1.3%	-11.5%
bs=1000 sw=10 sl=64	p50	1,113,349 us	1,101,247 us	989,693 us	+1.1%	+12.5%
bs=1000 sw=10 sl=64	p95	1,201,348 us	1,159,407 us	1,028,327 us	+3.6%	+16.8%
bs=1000 sw=10 sl=64	p99	1,201,348 us	1,159,407 us	1,059,969 us	+3.6%	+13.3%

Raw CSV

config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,567.76,200,128000,352,0.215,27279.54,36227.58,36227.58
1,100,10,64,20,2682.33,2000,1280000,746,0.455,133787.43,157814.55,157814.55
2,1000,10,64,20,22242.44,20000,12800000,899,0.549,1113348.51,1201348.25,1201348.25

### What changes were proposed in this PR? Restructures the per-operator summary the sync-execution backend returns and the agent-service / frontend consume, for a leaner, consistent wire contract. This is a focused re-do of apache#5927 cut directly from `main` (no foundation stack): it changes only the execution result/error model and its consumers. - Replace the flat `OperatorInfo` with `OperatorExecutionSummary` (orthogonal sub-summaries: `state`, `errorMessages`, `resultSummary?`, `consoleLogsSummary?`); rename `SyncExecutionResult` → `WorkflowExecutionSummary`. - `resultSummary.sampleTuples` is now `SampleRow[]` (`{ rowIndex, tuple }`) instead of JSON rows with an embedded `__row_index__`; drop the table-shape types (the agent derives input-port shapes from the DAG). - Move `WorkflowFatalError` into `types/execution.ts` and reuse it for per-operator errors — the same type the workflow-compiling service returns for compilation errors, so compile and execution errors share one wire shape; `api/compile-api.ts` re-exports it so its existing importers are unchanged. - `errorMessages` / `errors` are non-optional (empty = none); drop `compilationErrors`; collapse the console-message types and derive warnings from `WARNING:`-titled messages. - Operator results are still pulled on demand via `GET /agents/:id/operator-results` (transport unchanged); that REST payload now carries the canonical `OperatorExecutionSummary`, and the frontend maps it to its flat display type (re-flattening `sampleTuples` so the display components are unchanged). Touches the Scala producer (`SyncExecutionResource`), the agent-service consumers (`result-formatting`, `workflow-execution-tools`, `workflow-result-state`, `server`), and the frontend mapping. Representation/type-level; behavior preserved (input-port shape lines are now derived rather than explicitly rendered). ### Any related issues, documentation, discussions? Closes apache#5750 Part of apache#5747. Supersedes apache#5927. ### How was this PR tested? - agent-service: `tsc --noEmit` clean, `bun test` 110/110 pass, `prettier --check` clean. - The Scala producer (`SyncExecutionResource`) is unchanged from apache#5927, which verified it via `sbt WorkflowExecutionService/compile` and a full-stack end-to-end run (a Claude Haiku 4.5 agent built and executed a CSV workflow; `/operator-results` returned the new shape — `resultSummary.sampleTuples: [{ rowIndex, tuple }]`, `errorMessages: []`). ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 (1M context)

Add unit tests for the redesigned sync-execution result/error model to bring patch coverage to 100%: - workflow-execution-tools: drive executeOperatorAndFormat and createExecuteOperatorTool across pre-flight guards, successful runs (shape/warnings/gaps/cell types/truncation), execution failures (FAILED/KILLED/ERROR, per-operator and general errors), abort propagation, and callback failures. - texera-agent: exercise getFormattedResultsForDAG over the visible result branch. - workflow-result-state: cover getOperatorInfo. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wrap long tuple/list expressions to the 100-column limit and drop a stray blank line so scalafmtCheckAll passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- agent-interaction: unit-test the visualization html caching, column and row derivation (ellipsis on index gaps) via direct construction with stubbed services. - result-table-frame: cover setupResultTable populating currentResult, columns, and totalNumTuples for non-empty data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add SyncExecutionResourceSpec covering the unit-testable changed lines: the new result/error summary case classes and handleExecutionError (all compilation-error branches plus the unknown-error fallback), via PrivateMethodTester so no production visibility changes are needed. The remaining changed lines (executeWorkflowSync orchestration, collectOperatorInfos, and the collectOperatorResult truncation loop) drive a live Pekko execution + DB/Iceberg-backed result documents and are exercised by the integration suite, which is out of unit-test scope. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rs for testing Extract two behavior-preserving pure helpers so the result sampling and per-operator summary logic can be unit-tested without a live engine: - sampleAndTruncateTuples(tupleIterator, totalCount, ...) — the symmetric result truncation / sampling previously inlined in collectOperatorResult. - buildOperatorExecutionSummary(...) — the per-operator summary + console error extraction previously inlined in collectOperatorInfos. Both are pure code moves (identical expressions); call sites delegate to them. Expand SyncExecutionResourceSpec to cover every branch of both (empty/visualization/front-only/oversized/sliding-window truncation, and result/console-error/no-result summaries). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ting Factor the final WorkflowExecutionSummary assembly (fatal-error formatting, operator-console-error detection, state string, and success determination) out of executeWorkflowSync into a pure assembleExecutionSummary helper. Behavior-preserving code move; the live observable-wait / termination handling stays inline. Add unit tests covering the state/success derivation across terminal, console-error, target-results-override, operator-error, and fatal-error cases (SyncExecutionResourceSpec now 29 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… partials Add two unit cases to close branch gaps in the extracted helpers: - sampleAndTruncateTuples: empty iterator with a positive count (the `!tupleIterator.hasNext` half of the guard). - buildOperatorExecutionSummary: a console ERROR whose message is non-empty but shorter than the title (the length-comparison false branch), so the title is kept. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Shrink the sampling helpers' return to the three consumed values, extract the duplicated back-window loop into collectBackWindow, and drop the constructor-echo tests in favor of the behavioral suites. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

getOperatorErrorText pins the one derivation of "operator failed" the new errorMessages type forces (a fatal error carrying message text), replacing the divergent .length and joined-truthiness checks. In the failure branch, surface per-operator errors for any terminal state (success=false can come with state "Completed"), label CompilationFailed results distinctly, and preserve the Killed state on timeout instead of collapsing to Failed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tion in tests Import the existing WorkflowFatalError from workflow-websocket.interface instead of redeclaring it in agent.service.ts, and rewrite the agent-interaction spec onto a TestBed fixture with detectChanges() so the changed template is actually rendered and covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions Bot assigned bobbai00 Jun 29, 2026

github-actions Bot added engine refactor Refactor the code frontend Changes related to the frontend GUI agent-service labels Jun 29, 2026

bobbai00 mentioned this pull request Jun 29, 2026

refactor(agent-service): redesign sync-execution result and error model #5927

Closed

bobbai00 marked this pull request as draft June 29, 2026 01:00

bobbai00 force-pushed the refactor/sync-execution-result-model branch from 89eb9e9 to 84022c9 Compare June 29, 2026 04:56

bobbai00 marked this pull request as ready for review June 29, 2026 08:25

bobbai00 force-pushed the refactor/sync-execution-result-model branch from 43166b9 to 66d9785 Compare July 2, 2026 09:18

bobbai00 requested a review from Yicong-Huang July 2, 2026 09:25

bobbai00 and others added 15 commits July 4, 2026 15:41

refactor(frontend): consume execution summaries directly

316f456

refactor(agent-service): keep operator result summary name

cf39fa3

refactor: align execution result summary contract

782432c

refactor: remove legacy result marker fields

c19569f

style(execution-service): apply scalafmt to SyncExecutionResource

639152b

Wrap long tuple/list expressions to the 100-column limit and drop a stray blank line so scalafmtCheckAll passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bobbai00 force-pushed the refactor/sync-execution-result-model branch from 8e8a9eb to 392f4de Compare July 5, 2026 00:22

refactor(agent-service): dedupe result row formatting

e2c4475

bobbai00 mentioned this pull request Jul 5, 2026

refactor(agent-service): redesign the sync-execution result and error model #5750

Open

6 tasks

bobbai00 added 2 commits July 4, 2026 21:29

refactor(agent-service): flatten console message summaries

a14a5d0

refactor(agent-service): reuse fatal errors for workflow errors

27a267c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(agent-service): redesign sync-execution result and error model#6009

refactor(agent-service): redesign sync-execution result and error model#6009
bobbai00 wants to merge 18 commits into
apache:mainfrom
bobbai00:refactor/sync-execution-result-model

bobbai00 commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Jun 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bobbai00 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Reviewer Suggestions

Uh oh!

codecov-commenter commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Benchmark changes need a look

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bobbai00 commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading

codecov-commenter commented Jun 29, 2026 •

edited

Loading

github-actions Bot commented Jun 29, 2026 •

edited

Loading